Simultaneous Reliability Evaluation of Generality and Accuracy for Rule Discovery in Databases

نویسنده

  • Einoshin Suzuki
چکیده

This paper presents an algorithm for discovering conjunction rules with high reliability from data sets. The discovery of conjunction rules, each of which is a restricted form of a production rule, is well motivated by various useflll applications such as semantic query optimization and automatic development of a knowledge base. In a discovery algorithm, a production rule is evaluated according to its generality and accuracy since these are widely accepted as criteria in learning from examples. Here, reliability evaluation for these criteria is mandatory in distinguishing reliable rules from unreliable patterns without annoying the users. However, previous discovery approaches have either ignored reliability evaluation or have only evaluated the reliability of generality, and consequently, tend to discover a huge number of rules. In order to circumvent these difficulties we propose an approach based on a simultaneous estimation. Our approach discovers the rules that exceed pre-specified thresholds for generality and accuracy with high reliability. A novel pruning method is employed for improving time efficiency without changing the discovery outcome. The proposed approach has been validated experimentally using 21 benchmark data sets from the UCI repository.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Relative generality and precision of Evidence Based Medical Infor-mation Resources in the Recovery of Diabetes Information

Background and Aim: Relative generality and precision are two important criteria for measuring the efficiency and performance of information retrieval systems. The aim of this study was to compare the integrity and location of evidence-based bases in the digital library of Hamedan University of Medical Sciences in data retrieval of diabetes.    Methods: The design of this research is cross-sect...

متن کامل

Interestingness Measure for Mining Spatial Gene Expression Data using Association Rule

The search for interesting association rules is an important topic in knowledge discovery in spatial gene expression databases. The set of admissible rules for the selected support and confidence thresholds can easily be extracted by algorithms based on support and confidence, such as Apriori. However, they may produce a large number of rules, many of them are uninteresting. The challenge in as...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm

Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...

متن کامل

Application of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)

Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998